Product Code Database
Example Keywords: e-readers -mobile $11
   » » Wiki: Swish Function
Tag Wiki 'Swish Function'.
Tag

Swish function
 (

The swish function is a family of mathematical function defined as follows:

\operatorname{swish}_\beta(x) = x \operatorname{sigmoid}(\beta x) = \frac{x}{1+e^{-\beta x}}.

where \beta can be constant (usually set to 1) or trainable and "sigmoid" refers to the logistic function.

The swish family was designed to smoothly between a linear function and the ReLU function.

When considering positive values, Swish is a particular case of doubly parameterized sigmoid shrinkage function defined in

(2008). 9781424414833 .
. Variants of the swish function include Mish.


Special values
For β = 0, the function is linear: f( x) = x/2.

For β = 1, the function is the Sigmoid Linear Unit (SiLU).

With β → ∞, the function converges to .

Thus, the swish family smoothly between a linear function and the ReLU function.

Since \operatorname{swish}_\beta(x) = \operatorname{swish}_1(\beta x) / \beta, all instances of swish have the same shape as the default \operatorname{swish}_1 , zoomed by \beta . One usually sets \beta > 0. When \beta is trainable, this constraint can be enforced by \beta = e^b, where b is trainable.

\operatorname{swish}_1(x) = \frac x2 + \frac{x^2}{4}-\frac{x^4}{48}+\frac{x^6}{480}+O\left(x^8\right)

\begin{aligned} \operatorname{swish}_1(x) &= \frac{x}{2} \tanh \left(\frac{x}{2}\right) + \frac{x}{2}\\ \operatorname{swish}_1(x) + \operatorname{swish}_{-1}(x) &= x \tanh \left(\frac{x}{2}\right) \\ \operatorname{swish}_1(x) - \operatorname{swish}_{-1}(x) &= x \end{aligned}


Derivatives
Because \operatorname{swish}_\beta(x) = \operatorname{swish}_1(\beta x) / \beta, it suffices to calculate its derivatives for the default case.\operatorname{swish}_1'(x) = \frac{x+\sinh (x)}{4 \cosh ^2\left(\frac{x}{2}\right)} + \frac 12 so \operatorname{swish}_1'(x) - \frac 12 is odd.\operatorname{swish}_1 (x) = \frac{1- \frac x2 \tanh \left(\frac{x}{2}\right)}{2 \cosh ^2\left(\frac{x}{2}\right)} so \operatorname{swish}_1(x) is even.


History
SiLU was first proposed alongside the GELU in 2016, then again proposed in 2017 as the Sigmoid-weighted Linear Unit (SiL) in reinforcement learning. The SiLU/SiL was then again proposed as the SWISH over a year after its initial discovery, originally proposed without the learnable parameter β, so that β implicitly equaled 1. The swish paper was then updated to propose the activation with the learnable parameter β.

In 2017, after performing analysis on data, researchers from indicated that using this function as an activation function in artificial neural networks improves the performance, compared to ReLU and sigmoid functions. It is believed that one reason for the improvement is that the swish function helps alleviate the vanishing gradient problem during .


See also

Page 1 of 1
1
Page 1 of 1
1

Account

Social:
Pages:  ..   .. 
Items:  .. 

Navigation

General: Atom Feed Atom Feed  .. 
Help:  ..   .. 
Category:  ..   .. 
Media:  ..   .. 
Posts:  ..   ..   .. 

Statistics

Page:  .. 
Summary:  .. 
1 Tags
10/10 Page Rank
5 Page Refs
1s Time